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AGE- AND TIME- VARYING PROPORTIONAL HAZARDS MODELS 
FOR EMPLOYMENT DISCRIMINATION 

By George Woodworth and Joseph Kadane 

University of Iowa and Carnegie Mellon University 

We use a discrete-time proportional hazards model of time to in- 
voluntary employment termination. This model enables us to exam- 
ine both the continuous effect of the age of an employee and whether 
that effect has varied over time, generalizing earlier work [Kadane and 
Woodworth J. Bus. Econom. Statist. 22 (2004) 182-193]. We model 
the log hazard surface (over age and time) as a thin-plate spline, 
a Bayesian smoothness-prior implementation of penalized likelihood 
methods of surface-fitting [Wahba (1990) Splme Models for Observa- 
tional Data. SIAM]. The nonlinear component of the surface has only 
two parameters, smoothness and anisotropy. The first, a scale param- 
eter, governs the overall smoothness of the surface, and the second, 
anisotropy, controls the relative smoothness over time and over age. 
For any fixed value of the anisotropy parameter, the prior is equiva- 
lent to a Gaussian process with linear drift over the time-age plane 
with easily computed eigenvectors and eigenvalues that depend only 
on the configuration of data in the time-age plane and the anisotropy 
parameter. This model has application to legal cases in which a com- 
pany is charged with disproportionately disadvantaging older workers 
when deciding whom to terminate. We illustrate the application of 
the modeling approach using data from an actual discrimination case. 

1. Introduction. Federal law prohibits discrimination in employment de- 
cisions on the basis of age. There are two different bases on which a case 
may be brought alleging age discrimination. First, in a disparate impact 
case, the intent of the defendant is not at issue, but only the effect of the 
defendant's actions on the protected class, namely, those forty or older. For 
example, a rule requiring new hires to have attained bachelor's degrees af- 
ter 1995 would be facially neutral, but would have the effect of preventing 
the hiring of older applicants. For such a case, data analysis is essential to 
see whether the data support disproportionate disadvantage to persons over 
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40 years of age with respect to whatever employment practices might be 
in question. Those practices might include hiring, salary, promotion and/or 
involuntary termination. A disparate treatment case, by contrast, claims in- 
tentional discrimination on the basis of age. Malevolent action, as well as 
intention, must be shown in a disparate treatment case. While statistics can 
address the defendant's actions in a disparate treatment case, usually intent 
is beyond what data alone can address. 

This paper uses a proportional hazards model as the likelihood [Cox 
(1972)]. Finkelstein and Levin (1994) used such a model using as depen- 
dent variable the positive part of (age — 40) as an explanatory variable. 
Kadane and Woodworth (2004) treat age as a continuous variable, but do 
not model the response as a function of calender time. This paper models 
both age and time continuously. This choice enables us to examine both the 
effect of age of an employee on employment decisions (our example uses in- 
voluntary terminations) and whether that effect has varied over time. Hence, 
there are two continuous variables, time and the age of the employee. In this 
way, the work here generalizes our earlier work [Kadane and Woodworth 
(2004)] that allowed continuous time, but reduced age to a binary variable 
(over 40/under 40). The analysis presented here allows us to address the 
extent to which a pattern or practice of age-based discrimination extends 
over a period of time. Proportional hazards regression is particularly suited 
to a pattern or practice case because it concerns the probability or odds 
of a person of a given age being involuntarily terminated relative to that 
of a person of another age (or range of ages), and hence directly addresses 
whether an older person is disproportionately disadvantaged. 

We choose to use Bayesian inference because we find that it directly gives 
the probability that a person of a given age at a particular time is more 
likely to be fired than another person of a given other age at the same time. 
This contrasts with sampling-theory methods that give probabilities in the 
sample space, even after the sample is observed [Kadane (1990a)]. When 
combined with sensitivity analysis, Bayesian analysis permits us to assess 
the relative influence of the data and the model. We undertook the line of 
research in Kadane and Woodworth (2004) and in this paper to deal with 
temporally-sparse employment actions taken over a long time period. We 
particularly wanted to avoid the need to aggregate data into arbitrary time 
periods — months, quarters, years, etc. — in order to apply Cochran-Armitage 
type tests and the like. 

2. Proportional hazards regression. The data required to analyze age 
discrimination in involuntary terminations comprise the beginning and end- 
ing dates of each employee's period(s) of employment, that employee's birth 
date, and the reason advanced by the employer for separation from employ- 
ment (if it occurred) . Table 1 is a fragment of the data analyzed in Section 3 
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Table 1 

Flow data for the period June 1, 1989 to December 31, 1993 



Birth date Entry date Separation date Reason 



3/1/1925 


3/1/1961 


6/1/1990 


Vol" 


4/9/1938 


4/8/1961 


8/17/1992 


Vol 


10/17/1934 


4/5/1962 


6/3/1992 


Invol 


12/9/1939 


4/7/1962 


12/18/1991 


Invol 


11/29/1932 


5/29/1962 


8/26/1989 


Invol 


9/5/1928 


10/27/1962 


6/12/1991 


Vol 


5/31/1941 


1/12/1963 


n/a 


n/a 



a "Voluntary" termination includes death and retirement. 

below. Data were obtained for all persons employed by a firm at any time be- 
tween 06/07/f989 and 11/21/1993. The tenure of the last employee shown 
is right censored; that is, that employee was still in the work force as of 
12/31/1993, and we are consequently unable to determine the time or cause 
of his or her eventual separation from the firm (involuntary termination, 
death, retirement, etc.). 

2.1. Overview. The purpose of our statistical analysis is to determine 
how an employee's risk of termination depends on his or her age and how the 
risk for employees of a given age changes with time. The idea is to estimate a 
surface such as the one in Figure 1 in such a way that it balances a penalty for 
infidelity to the data and for a penalty for a surface that is unrealistically 
"rough" [Gersch (1982)]. The result is a surface that is generally within 
the margins of sampling error but is also smooth. Smoothness, generally 
speaking, amounts to not having areas of high curvature (i.e., spikes, cliffs, 
buttes, sharp creases, etc.). The idea is to get a good fit to the data without 
sacrificing smoothness. 

The mesh surface in Figure 1 is derived from a thin-plate spline model of 
the log odds (logit) of the probability of involuntary termination at a given 
time and age. The vertical axis shows the posterior median log-odds ratio 
of termination for employees of a given age on a given date relative to the 
weighted average rate for employees aged 39 years or younger on the same 
date (the legally unprotected class often used by statistical experts as a 
reference class for claims of disparate impact 1 ). The gray plane corresponds 



1 Note, however, that Mr. Justice Scalia's majority opinion in O'Connor v. Consolidated 
Coin Caterers Corp., 517 U.S. 308 (1996) states that "though the prohibition is limited 
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to odds ratios equal to 1.00, indicating no age discrimination relative to the 
reference class; points above this plane exhibit discrimination. Although the 
underlying thin plate spline is smooth, the log-odds ratio surface is locally 
slightly rough because the observed numbers of employees in each age bin 
at the time of each termination were used as weights in computing the 
termination rate in the reference class. 

The black ribbon in Figure 1 is the trajectory of the log-odds ratio over 
time for employees aged 56-57, and the dashed ribbon is the log-odds ratio 
as a function of age on day 1121 (05/30/92), the date of the involuntary 
termination of 57-year old plaintiff Wl in Case W described in Kadane and 
Woodworth (2004). The height of the surface at their intersection (0.297) 
is the posterior median log odds on the involuntary termination of 56-57 
year-old employees relative to those under 40 on that date. 

Figure 2 shows the posterior probability of age discrimination relative to 
under-40 employees as a function of age and date. Points above the gray 
plane represent dates and ages at which there was at least 70% posterior 



to individuals who are at least 40 years of age, §631 (a). This language does not ban 
discrimination against employees because they are aged 40 or older; it bans discrimination 
against employees because of their age, but limits the protected class to those who are 40 
or older. The fact that one person in the protected class has lost out to another person in 
the protected class is thus irrelevant, so long as he has lost out because of his age." 
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Fig. 2. Probability of age discrimination relative to under-40 employees. 

probability of age discrimination. By itself, this would be comparatively 
weak evidence; however, Kadane (1990b), commenting on empirical research 
by Mosteller and Youtz (1990), suggests that this level of probability could, 
in standard usage, be said to make it "likely" that discrimination had oc- 
curred. The height of the surface at the intersection of the dashed and black 
ribbons (0.79) is the posterior probability that employees aged 56-57 were 
terminated at a higher rate compared to under-40 employees. 

2.2. Proportional hazards models for time to event data. We are analyz- 
ing a group of individuals at risk for a particular type of failure (involuntary 
termination) for all or part of an observation period. The jth person enters 
the risk set at time hj (either his/her date of hire or the beginning of the 
observation period) and leaves the risk set at time Tj either by failure (in- 
voluntary termination), or for other reasons (death, voluntary resignation, 
reassignment, retirement), or was still employed at the end of the observa- 
tion period. The survival function Sj(t) = P(Tj > t) is the probability that 
the jth employee is still employed at time t. 

In practice, we rescale time and age to the unit interval [0, 1] and, to make 
computations tractable, discretize each to a finite grid; = to < t\ < ■ ■ ■ < 
t p = 1, = ao < a\ < • • • < a r = 1. Let pi w be the conditional probability that 
employee (worker) w is terminated in the interval (ii— given the param- 
eters and given that s(he) was in the workforce at time tj—\. The discretized 
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data for this employee are fi w , . . . , f pw ] ; r iw> ■ ■ ■ > r pw, where ri w — 1(0) if the 
employee was (not) in the work force (risk set) at time tj_i, and fi w = 1(0) 
if the worker was (not) involuntarily terminated (fired) in that interval. The 

joint likelihood for all employees is Yl^ =1 Ilf=i Piw (1 ~~ ViwT iw ~ fiw , where W 
is the total number of employees. Letting a w (t) denote the age of employee 
w at time t, we use the natural parametrization \ogit (pi W ) = f3(ti,a w (ti)), 
where /3(t, a) is a smooth function of time and age. 

The aggregated data riij and respectively, the number of employ- 

ees with ages in the interval [aj-i, dj) at time ti and the number of those who 
were terminated in that interval. At this level of aggregation, the likelihood 
is 

p r 

(2.1) 1(0) = J] II e MPijXij - n ij ln C 1 + exp(ftj))), 

i=lj=l 

where ftij = (3(ti,a,j). We assume that the grid is fine enough and the function 
smooth enough that variation of f3 within a grid cell is negligible. Changing 
the grid requires recomputing the cell counts, (riij,Xij) and basis vectors 
defined below, which is fairly time consuming. We did a few runs with a 
grid roughly twice as fine (which quadrupled the run time and storage re- 
quirements) without observing substantive changes in the results; however, 
we focused our sensitivity analysis on varying the prior distribution of the 
smoothness parameter, which appeared to have much greater impact on the 
results. We compute the log-odds ratio at time ti for employees aged dj 
relative to unprotected employees (i.e., employees under age 40) as 




where age u is age in years corresponding to scaled value a u , and logit(py) = 
/%• 

2.3. Thin-plate spline smoothness priors. Likelihood measures fidelity to 
data (the larger the better); however, it does not incorporate our belief that 
the hazard ratio varies comparatively smoothly with time and age; this is 
provided by a roughness penalty (the smaller the better) that is subtracted 
from the log-likelihood 

The smoothness parameter, A, weights the importance of smoothness rel- 
ative to fidelity to noisy data (larger values of the smoothness parameter 
produces smoother fitted surfaces). However, there is no reason to expect 
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the log odds to be isotropic — equally smooth in time and age — and for 
that reason we assume that there is a rescaling T = tj y 1 + p 2 , and A = 
pa/y/l + p 2 , such that the function b(T,A) = (3(T^l + p 2 , Ay/l + p 2 / p) is 
equally smooth (isotropic) in A and T. That is, the roughness penalty is 



(2.4) 



d 2 b(T,A) 
d 2 T 



+ 2 



d 2 b(T,A)\ 2 (d 2 b{T,A) 



dTdA 



+ 



d 2 A 



dTdA, 



which reduces to the anisotropic roughness penalty, 

J2 f&pfj. -NN 2 



(2.5) 



+ 2 



p 2 d 2 P(t,a) 

i + p 2 d 2 t 

p d 2 f3{t,a) 



1 + p 2 dtda 



+ 



1 d 2 f3{t,a) 
1+ p 2 d 2 a 



dt da, 



where p is called the anisotropy parameter and A = A/? 3 /(l + p 2 ). When 
p = 1 the surface is isotropic, and as p — > oo (or p — )■ 0), there is relatively 
less constraint on roughness in the age (or time) dimension. 

It is interesting to compare this model to the earlier one of Finkelstein 
and Levin (1994), which is a special case of ours. In their case, our function 
/?(-,-) takes the form 

P(ti,a w (ti)) = {a w {ti) - 40) + . 

Since that function has zero second partial derivatives (except at 40, where 
they do not exist), their function imposes smoothness in our sense. One 
could think of this computationally as setting A = 0. 

Since the likelihood depends on the smooth function f3(t, a) only through 
the values fyj, the roughness penalty is minimized for fixed f3ij when (3(t,a) 
is the interpolating thin-plate spline with values f3(ti,aj) = We have 
from Wahba [(1990), page 31, equation (2.4.9)] that there exist coefficients 
c such that the isotropic thin plate spline b(T, A) can be represented as 



(2.6) 



b{T,A) = Y,c ij H(T 



T U A 



A 3 ) + l(T,A), 



where l(T, A) is an arbitrary linear function, H(v) = |v| 2 ln(|v|)/(87r), and 



the coefficients satisfy the conditions ^ i • Cjj = Ylij ti c ij = Ylij a j c ij = 0- 
Then the isotropic roughness penalty, equation (2.4), reduces to Ac'K p c, 
where c is the vector of coefficients and K p is the pr x pr symmetric matrix 



with elements of the form k 



IJ,UV 



H(Ti 



T A- 



A v ) 



p(aj—a v ) 



)• 



To accommodate the constraints on vector c, let P be the projection onto 
the linear space orthogonal to the constraints so that c = Pc. 
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Finally, let PK p P = U p A p U p be the spectral decomposition of PK p P 

1/2 

and define the basis vectors B p as the nonzero columns of U p A p . It follows 
that the model for the vector of logits is 

(3 = K p c + L<j) 

(2.7) =K p Pc + L0 

= PK p Pc + (I - P)K p Pc + L0, 

where (3 is the matrix with ijih row and the ijth. row of matrix L 
is (l,tj,Oj). But I — P is the projection onto the column space of L and, 
consequently, (I — P)K p Pc can be absorbed into the linear term. Therefore, 
the model reduces to 

/3 = PK p Pc + (I - P)K p Pc + L0 

(2.8) =U p Ay 2 (Ay 2 U p c)+L0 
= B p d + L(/>, 

1/2 1/2 

where 5 = A p U p c and B p = U p A p . Thus, for a given anisotropy, p, the 
columns of B p are basis vectors for the nonlinear part of the logit vector (3. 

The roughness penalty is Ac'K p c = Ac'PK p Pc = Ac'U p A p Uc = Xd'S. 
The standard Bayesian interpretation of penalized likelihood estimation is 
that the penalty function is the log of the prior density of d. Consequently, 
the components of that vector are a-priori independent and identically dis- 
tributed normal random variables with precision A. It follows that the prior 
conditional variance of (3 given A, p and (J) is 

Var(B p <5) = A~ 1 B p B p 

= A x PK p P 

and, consequently, if d is a vector such that d'L = 0, then 

(2.9) Var(d'/3) = A^d'K^d. 

The posterior distributions of A and p are not well identified by the data 
and it is necessary to be somewhat careful about specifying their priors. 
However, the regression coefficients, 0, of the linear component do not in- 
fluence smoothness, are well identified by the data, and can be given diffuse, 
normal prior distributions. 

Viewing both time and age as continuous variables allows a more precise 
and general view of a firm's policy. However, due to the comparative sparse- 
ness of the data, some constraint on or penalty for roughness is needed to 
avoid an unrealistically rough model, unlike that depicted in Figure 1. It 
is, of course, possible to introduce discrete discontinuities into an otherwise 
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smooth model at time points where there is other evidence of a shift in em- 
ployment practices [see, e.g., Figure 6 in Kadane and Woodworth (2004)]. 
However, we do not think that it is appropriate to "mine" for unknown num- 
bers of discontinuities at unknown time points in the sparse data common in 
age-discrimination cases. Hence, it is necessary to smooth the data. The key 
parameters in doing so are smoothness and anisotropy. The smoothness pa- 
rameter controls the average smoothness of the surface and the anisotropy 
parameter controls the relative degree of smoothing in the age and time 
coordinates. 

3. Case W revisited. Over an observation period of about 1600 days the 
workforce at a firm was reduced by about two thirds; 103 employees were 
involuntarily terminated in the process. A new CEO took control at day 862, 
near the middle of the observation period. The plaintiff asserted that em- 
ployees aged 50 and above were targeted for termination under the influence 
of the new CEO. Here we present a fully Bayesian analysis with smoothly 
time- and age-varying odds ratio. The personnel data were aggregated by 
status (involuntarily terminated, other) into one-week time intervals and 
two-year age intervals (20-21, 22-23, . . . , 64-65). Figures 1 and 2 show pos- 
terior medians and posterior probabilities of age-related discrimination (i.e., 
of increased odds of termination relative to unprotected employees). 

3.1. Forming an opinion about smoothness and anisotropy. The anisotropy 
parameter p governs the relative smoothness in time relative to age. This is 
clearly illustrated in Figure 3, which shows the seventh eigensurface (basis 
function) for (a) the isotropic case where there is about one cycle in either 
direction in contrast to (b) the anisotropic case p = 4 in which the surface 
is four times rougher in the age dimension (there are about 3 half cycles in 
the age dimension to about 3/4 of a half cycle in the time dimension). 

In the context of employment discrimination, we think that, in terms 
of roughness of the logit, a 3-year age difference is about equivalent to a 
business quarter. Recalling that we have rescaled 1600 calendar days and a 
45-year age span into unit intervals, a quarter is 0.056 and a three-year age 
interval is 0.067 of the unit interval, corresponding to anisotropy p = 1.2. 
We have found empirically that doubling or halving anisotropy has a fairly 
modest effect on surface shape; consequently, we used the prior distribution 
shown in Table 2, which has prior geometric mean 1.4. 

As in our earlier analysis of this case [Kadane and Woodworth (2004)], 
we now derive a prior distribution for the smoothness parameter from our 
belief that the odds ratio on termination for a 10-year age difference are 
unlikely to change more than 15% over a business quarter. This implies that 
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a) P = 1.0 



b) P = 4.0 



J2 




J3 




Fig. 3. Effect of anisotropy on the 7th basis function. 



a particular mixed difference is unlikely to exceed 0.15 in absolute value; 
that is, Prior(/| A a /3(to, oo)| < 0.15) is large, where 

A 2 A a (3(t ,a ) 

= P(t + 2d t ,a + d a ) - 2(3(t + d t ,a + d a ) + (3(t ,a + d a ) 

-p(t + 2d t ,a ) + 2p(to + dt,ao)-p{to,a ), 

where dt is a rescaled half-quarter and d a is a rescaled decade. We have 
from equation (2.9) that the prior distribution of A 2 A a /3(io, oo) is normal 
with mean zero and conditional variance, d'Hd/A = V p /A, where H is the 
matrix with entries H(Ti — Ti>,Aj — Aji), d is the vector (1, —2, 1, —1, 2, — 1), 

Ti = (t + tdt)/^l + p 2 , i = 0, 1, 2, and Aj = p(a + jd a )/ \/l + p 2 ,j = 0, 1. 
Values of V p are listed in Table 3. 

The conditional prior distribution of the smoothness parameter given the 
anisotropy parameter is gamma with shape parameter and scale parameter 
selected so that Prior(| A 2 A a f3(to, ao)| < 0.15) = 1 — a is large. To complete 
the derivation, we have, conditional on p, that 

1 - (3(sh p ,0.05) 



[A t A a /3(t ,a )] ~V P - 



Table 2 

Prior distribution of the anisotropy parameter 



p 


8 


4 


2 


1 


0.5 


0.25 


Prior 


0.08 


0.16 


0.26 


0.26 


0.16 


0.08 



Larger p-values favor smoothness in time. 
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Table 3 

Prior variance xX of Af A a /?(to, «o) and prior scale 
parameter of A 



Anisotropy p 


v P 


sc p for sh p = 0.5 and a = 0.05 


8 


0.000383 


5.04 


4 


0.000453 


4.26 


2 


0.000492 


3.93 


1 


0.000449 


4.30 


0.5 


0.000332 


5.81 


0.25 


0.000195 


9.90 



where, abusing the notation somewhat, we let T(sh) denote an indepen- 
dent gamma-distributed random variable with shape parameter sh, and let 
j3(sh,0.5) denote a beta-distributed random variable. Consequently, if 

Prior([A 2 A a /?(t , a )] 2 < 0.15 2 ) = 1 - a, 

then 

0.15 2 /3 a (s/i„,0.5) 

en := 

p v p {i-p a {sh p ,Q.h)y 

where /3 a (sh p , 0.5) is the ath quantile of the /3(sh p ,0.5) distribution. The 
third column of Table 3 shows the values of the scale parameter, sc p that 
we used to compute the surface in Figures 1 and 2. 

3.2. Computing the posterior distribution. To estimate this model, we 
included enough basis vectors in the last row of equation (2.8) to account 
for at least 95% of the total roughness variance a priori (i.e., we included 
basis vectors accounting for 95% of the sum of the eigenvalues of K p ). We 
computed the posterior distribution of the probabilities of involuntary ter- 
mination, and of the odds ratios relative to under-40 employees in each 
time-age bin using a program written in SAS IML language. For a given 
anisotropy value, p, we used the Metropolis-Hastings within the iteratively 
reweighted least squares algorithm proposed by Gamerman (1997) to sepa- 
rately update the logistic regression coefficient vectors eft and 5, and a Gibbs 
step to update the smoothness parameter, A. Anisotropy values were chosen 
from the six shown in Table 2, where, beginning with an arbitrary initial 
value, we attempted a jump from the current anisotropy value to an adja- 
cent value with transition probabilities from the 6x6 doubly stochastic 
matrix shown in Table 4. Letting current parameter values be 5, cj), A, and 
p, we attempt a reversible jump, p—t p. We then propose values (f) = (ft, and 
A = p ■ scj sc, where sc and sc are scale parameters from Table 3 correspond- 
ing to p and p, respectively. Finally, we generate a proposal for S as follows. 
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Let (3 = Bp<5 + Jj ■ 4> be the current logit vector and let p be the current 
vector of termination probabilities in time-age bins [i.e., logit (p) = f3] and 
let q = 1 — p. Let vectors n and y be the numbers at risk and terminated 
in the time-age bins. Then, S is proposed from the multivariate normal dis- 
tribution with precision II = [A + B~npqBp] and mean jl = II B~npq • y, 
where B p is the matrix of basis vectors corresponding to anisotropy p, as 



defined in the paragraph after equation (2. 
The proposal is accepted with probability 



and y = B p 5 + (y - p)/pq. 



a = mm 



mm 



1, 



p(p)p{X\p)p{5\X)l{P) p(p -> p)q(S\X, 6, <j>) 
p(p)p(X\p)p(5\X)l(/3) ' p (p p)q(5\X, 5, 4>) 



dX 
dX 



l,p(p)X^ 2 exp[-^X5'5)m 



x |n|°- 5 exp 



p{p)X q " 



--\5'5)m 



x |n|°- 5 exp 



1 



(s-jiyu(6-fiy 



where l((3) is the likelihood function [equation (2.1)], q and q are the ranks of 
Bp and Bp, and 
[Green (1995)]. 



Bp and Bp, and p and II are the mean and precision of the reverse proposal 



3.3. Sensitivity analysis. It is a good statistical practice to investigate 
whether and to what extent the results of an analysis are sensitive to the 
prior distribution. That means in this case investigating the influence of the 
prior distribution of the smoothness and anisotropy parameters. Figures 1 
and 2 above are based on our preferred prior distribution as specified in 
Tables 2 and 3. In Figure 3 we compare Figure 1 (a) with an analysis (b) in 
which the scale parameters in Table 4 are multiplied by 10, decreasing the 
roughness penalty by a factor of 10 and producing a substantially rougher 
surface. Figure 5 shows the effect of this variation on the probability of 
discrimination. 



3.4. Identification of the anisotropy parameter. Table 5 shows the 
marginal posterior distribution of the anisotropy parameter for the pre- 
ferred prior distribution of the smoothness parameter (Table 3). The poste- 
rior probability P(p\Data) is the observed rate of sampler visits to value p 
of the anisotropy parameter in 19,000 replications, the marginal likelihood 
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Table 4 

Jump proposal probabilities for the anisotropy 
parameter 



Anisotropy 


8 


4 


2 


1 


0.5 


0.25 


8 


0.9 


0.1 










4 


0.1 


0.8 


0.1 








2 




0.1 


0.8 


0.1 






1 






0.1 


0.8 


0.1 




0.5 








0.1 


0.8 


0.1 


0.25 










0.1 


0.9 



is P(p\Data)/P(p) oc P(Data\p), and po.025 and Po.975 are nominal Monte- 
Carlo error bounds computed on the assumption that the observed rate has 
a binomial distribution. 

It is clear from the marginal likelihood that the data carry information 
about anisotropy and, in particular, that models with large values of p (i.e., 
which are very rough in the time dimension) are discontinued by the data. 
However, high levels of smoothness in the time dimension are not discon- 
firmed by data and apparently must be discouraged by the prior. Because 
of this, we investigated the effect of a prior that forces more smoothness in 
the time dimension. 

In Figure 6 we altered the prior distribution for the anisotropy parameter 
to favor smoothness in the time dimension (Table 6). In this case the prior 
geometric mean of the anisotropy parameter is about 4, meaning that we 
think that, in terms of roughness of the log odds on termination, a decade 
of age is about equivalent to a business quarter (see Section 3.1). Evidence 



a) Preferred smoothness prior b) Smoothness scale parameter x1 




Fig. 4. Effect of the smoothness prior on the log odds ratio. 
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Table 5 

Posterior distribution and marginal likelihood of the anisotropy parameter 



p 


Prior 


Posterior" 




Marginal likelihood 




P(p\Data) 


P0.025 


P0.975 


oc P(Data\p) 


P0.025 


P0.975 


8 


0.08 


0.122 


0.12 


0.13 


1.53 


1.47 


1.61 


4 


0.16 


0.231 


0.22 


0.24 


1.44 


1.40 


1.50 


2 


0.26 


0.286 


0.28 


0.30 


1.10 


1.07 


1.14 


1 


0.26 


0.217 


0.21 


0.23 


0.83 


0.81 


0.87 


0.5 


0.16 


0.101 


0.10 


0.11 


0.63 


0.61 


0.67 


0.25 


0.08 


0.043 


0.04 


0.05 


0.54 


0.50 


0.59 



a fo.025 and po.975 are Monte-Carlo error bounds (see text). 



of discrimination in the plaintiff's case (the intersection of the dashed and 
black ribbons) is slightly stronger for the prior that forces more smoothness 
in the time dimension; P(OR > \\Data) is about 0.79 for the preferred prior 
(a) and about 0.83 for the more time-smoothing prior (b). 

Although the analysis in panel (b) is more favorable to the plaintiff, we 
think it would be less persuasive to the trier(s) of fact (judge or jury) since 
it does not seem to distinguish between the periods before and after the 
arrival of the new CEO (day 862). 

3.5. Previous analyses of case W. The plaintiff who was between 50 and 
59 years of age was one of 12 employees involuntarily terminated on day 
1092. He brought an age discrimination suit against the employer under the 
theory that the new CEO had a pattern of targeting employees aged 50 and 
above for termination. 



a) Preferred smoothness b) Smoothness scale parameter x1 




Day 



Fig. 5. Effect of the smoothness prior on the posterior probability of discrimination. 
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a) Prior mean p = 1 .4 b) Prior mean p = 4.1 




Fig. 6. Effect of the anisotropy parameter on the posterior probability of discrimination. 

In the original case, the plaintiff's statistical expert tabulated involuntary 
termination rates for each calendar quarter and each age decade. He reported 
that, "[Involuntary] separation rates for the [period beginning at day 481] 
averaged a little above three percent of the workforce per quarter for ages 
20 through 49, but jumped to six and a half percent for ages 50 through 
59. The 50-59 year age group differed significantly from the 20-39 year age 
group (signed-rank test, p = 0.033, one sided)." The plaintiff alleged and the 
defendant denied that the new CEO had vowed to weed out older employees. 
The case was settled before trial. 

In a subsequent re-analysis [Kadane and Woodworth (2004)], we employed 
a proportional hazards model with separate, smoothly time- varying log haz- 
ard ratios for ages 40-49, and 50-64, with ages 20-39 as the reference cate- 
gory. Thus, the log hazard ratio was smooth over time but piecewise constant 
over age; Figure 7 is reproduced with permission from that paper. Our pre- 
ferred model, represented by the solid curves, had prior mean smoothness 
0.007. For this prior the posterior probability of age-discrimination in the 
case of Plaintiff Wl was 0.842. 

The model depicted in Figure 7 has two explanatory variables for age, 
an indicator variable for age in the range 40-49 and an indicator variable 



Table 6 

Alternate prior distribution of the anisotropy parameter 



p 


8 


4 


2 


1 


0.5 


0.25 


Prior 


0.5 


0.25 


0.125 


0.0625 


0.03125 


0.03125 



Larger p-values favor smoothness in time. 
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180 360 540 720 900 1080 1260 1440 1620 
Day 

Fig. 7. Smooth by piecewise constant proportional hazards model. 

for age 50 and above (there are no employees 65 and over in the data set). 
The likelihood model was proportional hazards regression with smoothly 
time- varying coefficients for the two explanatory variables. Three analyses 
are shown here with different prior means for the smoothness parameter, 
A. The upper panel shows posterior means of the proportional-hazards re- 
gression coefficients as functions of time and the smoothness parameter. As 
suggested in the figure, the regression coefficients are interpretable as in- 
stantaneous log-odds ratios with unprotected, under-40, employees as the 
reference category. The second panel presents posterior probabilities that 
the two regression coefficients are positive; that is, that the termination rate 
is higher for the protected subclasses compared to the unprotected class. 
For example, at the time of plaintiff W2's termination, the posterior prob- 
ability exceeds 80% that employees age 50 and above had a higher risk of 
termination than the protected class. 

A second plaintiff, W2 aged 60 terminated on day 733, also brought an 
age-discrimination suit on the theory that employees aged 60 and above were 
disproportionately targeted at the time of his termination. On that day three 
of eight employees (37.5%) aged 60 and up were terminated compared to 15 
of 136 (11.0%) employees terminated out of all other age groups (one-sided 
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Fisher exact test p = 0.0530). In our re-analysis the posterior probability of 
age discrimination against employees aged 50-64 was about 50% but did not 
distinguish between employees aged 50-59 and 60-64. Our second re-analysis 
reported in this paper remedies that deficiency and gives a more detailed 
picture of the impact of age on the risk of discrimination; in particular, for 
our preferred prior, the posterior probability of age discrimination against 
60-year old employees on day 733 is about 65% but is only about 37% for 
50-year old employees. 

3.6. Summary. Table 7 summarizes the results of the three analyses of 
case W for each of the two plaintiffs. In the first, classical, analyses for 
Plaintiff Wl, it is assumed that each employee in the age groups 20-39 and 
50-59 has the same chance of being involuntarily terminated (i.e., fired) 
in each quarter- year after day 481. The test of significance calculates the 
probability of obtaining data as or more extreme than that observed were 
it true that persons in these two age groups have the same chance of being 
fired in any given quarter. The classical analysis for plaintiff W2 is somewhat 
different, in that it focuses solely on what happened on the day that W2 
was fired. It conditions on both the age distribution of the workforce at the 
time (eight of 144 employees 60 years old or older) and the number fired (18) 
and computes the probability of three or more of the eight older employees 
being fired, if employees were equally likely to be fired. 



Table 7 

Summary of three analyses of Case W 







Figure of 


Treatment 


Age X time 


Plaintiff 


Analysis 


Method 


merit 


of age 


interaction 


Wl 


W2 


Original 


Frequentist p-value 


categorical: 


none 


0.033 


0.053 


expert's 






40-up 








report 














Kadane and 


Bayesian 


probability of 


categorical: 


smooth 


0.84 


0.50 


Woodworth 




disproportional 


40-49, 








(2004) 




disadvantage 


50-64 


smooth/w 


0.88 


0.49 










discontinuity 














at day 862 






This paper 


Bayesian 


probability of 


smooth 


smooth 


0.65 


0.37 






disproportional 














disadvantage 










Anonymous 


Cox 


p-value, 


linear 


none but 


p: 0.041 


n/a 


referee of 


regression 


OR, and 


above 40 


restricted 


OR: 2.04 




this paper 




90% LCL 




to day 


LCL: 1.01 












1000 up 







18 



G. WOODWORTH AND J. KADANE 



The second analysis is based on a model for the log odds of being fired 
that is continuous in time but still assumes constancy in age categories. The 
analysis of this paper relaxes this latter assumption, and allows smoothness 
in both age and time. In both Bayesian analyses, the probability computed 
is that an employee of a given age was more likely to be fired at a particular 
time than was an employee in the unprotected 20-39 age group. 

Although the classical analyses are computing probabilities in the sample 
space while the Bayesian analyses are computing probabilities in the param- 
eter space, the stronger effect here appears to be that as the assumptions 
get less rigid, there is less certainty that these plaintiffs' cases were meri- 
torious, as Table 7 shows. In view of the tendency of Bayesian analyses to 
draw estimates toward each other, this is perhaps not too surprising. 

4. Discussion. In a nonhierarchical model, the effect of the prior can be 
isolated by separately reporting the likelihood function and the prior dis- 
tribution. In particular, if the parameter space is divided into two disjoint 
subsets, the likelihood ratio and the prior odds suffice. However, in a hi- 
erarchical model such as this one, such a separation is not possible. For 
this reason, we have reported the results of changing our prior directly, in 
Sections 3.3, 3.4 and 3.5. 

We have presented a global analysis of involuntary terminations that in- 
corporates all of the data but reflects fine-grained variations over time and 
age of employee. The results are somewhat sensitive to assumptions about 
prior distribution of the smoothness parameter, although not enough to 
materially alter the strength of evidence supporting the plaintiff's discrim- 
ination claim in Case W. This analysis, in our view, casts new light on the 
apparent patterns in coarser-grained descriptive presentations that might be 
easier for nonspecialists to grasp. 

Our intent is to develop a methodology that does not require complex as- 
sumptions about the relationship between time, age and risk of termination. 
Indeed, the only structural assumption is smoothness and the only prior 
opinion required has to do with the degree of smoothness. We have sug- 
gested how that prior opinion could be elicited by considering how rapidly 
the risk of termination is likely to change over a business quarter and over 
a decade of age. A referee described our analysis as "staggeringly complex" 
and "shuddered to think what a judge or jury would make of this approach." 
All statistical analyses are "staggeringly complex" to most laypersons. We 
think our responsibility as statisticians (and experts in court) is to present 
our best analysis of the data, and to explain it as best as we can. 

A global analysis such as this one is more powerful and more appropri- 
ate than analyzing subsets of the data, perhaps in the form of individual 
termination waves or individual business quarters, and more appropriate 
than analyzing coarse aggregations such as employees aged 40 and above 
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compared to younger employees. The fallacy of subdividing the data is that 
such analyses implicitly assume that there is no continuity in the behavior of 
a firm and no difference in treatment of employees of different ages within the 
same broad age category (40 and older) . We believe that the appropriate ap- 
proach to possible inhomogeneities of the age effect is to incorporate them in 
a global model — see, for example, our discussion of Gastwirth's (1992) anal- 
ysis in Valentino v. United States Postal Service [Gastwirth (1992), Kadane 
and Woodworth (2004)]. 

Finally, it has not escaped our notice that our analysis of Case W has 
made it clear that only a subgroup of older employees, centered around 
the peak at day 1275 and age 54-55, has even moderately strong statistical 
evidence to support a claim of age discrimination. We believe that this is 
precisely the information that the court needs in order to determine how an 
award (if any) should be distributed among members of a certified class. 

SUPPLEMENTARY MATERIAL 

Supplement A: Employment — Case W (DOI: 10.1214/10-AOAS330SUPPA; 
.txt). Data from two cases described in the paper "Hierarchical models for 
employment decisions," by Kadane and Woodworth. A constant number of 
days has been subtracted from each date to preserve confidentiality. 

Supplement B: Code for calculations (DOI: 10.1214/10-AOAS330SUPPB; 
.zip). 
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